Lightweight Logging and Recovery for Distributed Shared Memory over Virtual Interface Architecture
نویسندگان
چکیده
As software Distributed Shared Memory(DSM) systems become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, we propose a lightweight logging scheme, called remote logging, and a recovery protocol for home-based DSM. Remote logging stores coherence-related data to the volatile memory of a remote node. The logging overhead can be moderated with high-speed system area network and user-level DMA operations supported by modern communication protocols. Remote logging tolerates multiple failures if the backup nodes of failed nodes are alive. It makes the reliability of DSM grow much higher. Experimental results show that our fault-tolerant DSM has low overhead compared to conventional stable logging and it can be effectively recovered from some concurrent failures.
منابع مشابه
A Lightweight Causal Logging Scheme for Recoverable Distributed Shared Memory Systems
This paper presents a new causal logging scheme for lazy release consistent distributed shared memory systems. For the eecient implementation of causal logging, data structures and operations supported by the lazy release consistency memory model are utilized. Also, unlike the previous scheme which logs the vector clock for each synchronization operation, the proposed scheme adds the minimum in...
متن کاملPractical Schemes using Logs for Lightweight Recoverable DSM
In the existing Fault-Tolerant Software Distributed Shared Memory (FT-SDSM) with the message logging, the logs are used only to recover the failed nodes. In our previous work, we have implemented a lightweight logging protocol, called remote logging, on the SDSM for fault tolerance, which incurs low logging overhead with a fast network and a remote memory for back-up data. In this paper, we pro...
متن کاملSoftware Distributed Shared Memory over Virtual Interface Architecture: Implemenation and Performance
متن کامل
Architectural Issues in Adopting Distributed Shared Memory for Distributed Object Management Systems
Distributed shared memory (DSM) provides transparent network interface based on the memory abstraction. Furthermore, DSM gives us the ease of programming and portability. Also the advantages ooered by DSM include low network overhead, with no explicit operating system intervention to move data over network. With the advent of high-bandwidth networks and wide addressing, adopting DSM for distrib...
متن کاملLazy Logging and Prefetch-Based Crash Recovery in Software Distributed Shared Memory Systems
In this paper, we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software distributed shared memory (SDSM). Our lazy logging protocol minimizes failure-free overhead by logging only data indispensable for correct recovery, while our PCR protocol reduces the recovery time by prefetching data ...
متن کامل